In this paper we present TruFor, a forensic framework that can be applied to a large variety of image manipulation methods, from classic cheapfakes to more recent manipulations based on deep learning. We rely on the extraction of both high-level and low-level traces through a transformer-based fusion architecture that combines the RGB image and a learned noise-sensitive fingerprint. The latter learns to embed the artifacts related to the camera internal and external processing by training only on real data in a self-supervised manner. Forgeries are detected as deviations from the expected regular pattern that characterizes each pristine image. Looking for anomalies makes the approach able to robustly detect a variety of local manipulations, ensuring generalization. In addition to a pixel-level localization map and a whole-image integrity score, our approach outputs a reliability map that highlights areas where localization predictions may be error-prone. This is particularly important in forensic applications in order to reduce false alarms and allow for a large scale analysis. Extensive experiments on several datasets show that our method is able to reliably detect and localize both cheapfakes and deepfakes manipulations outperforming state-of-the-art works. Code will be publicly available at https://grip-unina.github.io/TruFor/
translated by 谷歌翻译
得益于深度学习的最新进展,如今存在复杂的生成工具,这些工具产生了极其现实的综合语音。但是,这种工具的恶意使用是可能的,有可能对我们的社会构成严重威胁。因此,合成语音检测已成为一个紧迫的研究主题,最近提出了各种各样的检测方法。不幸的是,它们几乎没有概括为在训练阶段从未见过的工具产生的合成音频,这使他们不适合面对现实世界的情况。在这项工作中,我们旨在通过提出一种仅利用说话者的生物特征的新检测方法来克服这个问题,而无需提及特定的操纵。由于仅在实际数据上对检测器进行训练,因此可以自动确保概括。建议的方法可以基于现成的扬声器验证工具实现。我们在三个流行的测试集上测试了几种这样的解决方案,从而获得了良好的性能,高概括能力和高度鲁棒性。
translated by 谷歌翻译
Face manipulation technology is advancing very rapidly, and new methods are being proposed day by day. The aim of this work is to propose a deepfake detector that can cope with the wide variety of manipulation methods and scenarios encountered in the real world. Our key insight is that each person has specific biometric characteristics that a synthetic generator cannot likely reproduce. Accordingly, we extract high-level audio-visual biometric features which characterize the identity of a person, and use them to create a person-of-interest (POI) deepfake detector. We leverage a contrastive learning paradigm to learn the moving-face and audio segment embeddings that are most discriminative for each identity. As a result, when the video and/or audio of a person is manipulated, its representation in the embedding space becomes inconsistent with the real identity, allowing reliable detection. Training is carried out exclusively on real talking-face videos, thus the detector does not depend on any specific manipulation method and yields the highest generalization ability. In addition, our method can detect both single-modality (audio-only, video-only) and multi-modality (audio-video) attacks, and is robust to low-quality or corrupted videos by building only on high-level semantic features. Experiments on a wide variety of datasets confirm that our method ensures a SOTA performance, with an average improvement in terms of AUC of around 3%, 10%, and 4% for high-quality, low quality, and attacked videos, respectively. https://github.com/grip-unina/poi-forensics
translated by 谷歌翻译
虚假图像的更高质量和广泛传播已经为可靠的法医制作产生了追求。最近已经提出了许多GaN图像探测器。然而,在现实世界的情景中,他们中的大多数都表现出有限的鲁棒性和泛化能力。此外,它们通常依赖于测试时间不可用的侧面信息,即它们不是普遍的。我们研究了这些问题,并基于有限的子采样架构和合适的对比学习范例提出了一种新的GaN图像检测器。在具有挑战性的条件下进行的实验证明了提出的方法是迈向通用GaN图像检测的第一步,确保对常见的图像障碍以及看不见的架构的良好概括。
translated by 谷歌翻译
综合产生的内容的广泛扩散是一种需要紧急对策的严重威胁。合成含量的产生不限于多媒体数据,如视频,照片或音频序列,但涵盖了可以包括生物图像的显着大面积,例如西幕和微观图像。在本文中,我们专注于检测综合生成的西幕图像。生物医学文献在很大程度上探讨了西部污染图像,已经表明了如何通过目视检查或标准取证检测器轻松地伪造这些图像。为了克服缺乏公开可用的数据集,我们创建了一个包含超过14k原始的西幕图像和18K合成的Western-Blot图像的新数据集,由三种不同的最先进的生成方法产生。然后,我们调查不同的策略来检测合成的Western印迹,探索二进制分类方法以及单级探测器。在这两种情况下,我们从不利用培训阶段的合成纤维图像。所达到的结果表明,即使在这些科学图像的合成版本未优化利用检测器,综合生成的西幕图像也可以具有良好的精度。
translated by 谷歌翻译
Figure 1: FaceForensics++ is a dataset of facial forgeries that enables researchers to train deep-learning-based approaches in a supervised fashion. The dataset contains manipulations created with four state-of-the-art methods, namely, Face2Face, FaceSwap, DeepFakes, and NeuralTextures.
translated by 谷歌翻译
Modelling the temperature of Electric Vehicle (EV) batteries is a fundamental task of EV manufacturing. Extreme temperatures in the battery packs can affect their longevity and power output. Although theoretical models exist for describing heat transfer in battery packs, they are computationally expensive to simulate. Furthermore, it is difficult to acquire data measurements from within the battery cell. In this work, we propose a data-driven surrogate model (LiFe-net) that uses readily accessible driving diagnostics for battery temperature estimation to overcome these limitations. This model incorporates Neural Operators with a traditional numerical integration scheme to estimate the temperature evolution. Moreover, we propose two further variations of the baseline model: LiFe-net trained with a regulariser and LiFe-net trained with time stability loss. We compared these models in terms of generalization error on test data. The results showed that LiFe-net trained with time stability loss outperforms the other two models and can estimate the temperature evolution on unseen data with a relative error of 2.77 % on average.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
The class of bigraphical lasso algorithms (and, more broadly, 'tensor'-graphical lasso algorithms) has been used to estimate dependency structures within matrix and tensor data. However, all current methods to do so take prohibitively long on modestly sized datasets. We present a novel tensor-graphical lasso algorithm that analytically estimates the dependency structure, unlike its iterative predecessors. This provides a speedup of multiple orders of magnitude, allowing this class of algorithms to be used on large, real-world datasets.
translated by 谷歌翻译
我们介绍了IST和Unmabel对WMT 2022关于质量估计(QE)的共享任务的共同贡献。我们的团队参与了所有三个子任务:(i)句子和单词级质量预测;(ii)可解释的量化宽松;(iii)关键错误检测。对于所有任务,我们在彗星框架之上构建,将其与OpenKIWI的预测估计架构连接,并为其配备单词级序列标记器和解释提取器。我们的结果表明,在预处理过程中合并参考可以改善下游任务上多种语言对的性能,并且通过句子和单词级别的目标共同培训可以进一步提高。此外,将注意力和梯度信息结合在一起被证明是提取句子级量化量化宽松模型的良好解释的首要策略。总体而言,我们的意见书在几乎所有语言对的所有三个任务中都取得了最佳的结果。
translated by 谷歌翻译